1 Executive summary

The US Department of Energy has launched the Materials Project initiative, which provides open source data on thousands of materials. In this analysis, I focus on the MP Batteries dataset, which has been released as part of the project. The aim of the analysis is to present the characteristics of the batteries. The analysis focuses on 4 different aspects: distribution of attributes, correlations between attributes, characteristics depending on working ion and predictions.

2 Technical insights

The analysis is performed with the R language. I used the following packages: - dplyr to clean up the dataset and get better control over the data, - kableExtra, ggcorrplot, GGally and plotly to prepare visualisations, - caret, randomForest, RRF to run regressions on the dataset.

To ensure the reproducibility of my work, I set the seed (initial state for random number generation) to 379.

3 Dataset description

The MP Batteries dataset comes from materials project website. It consists of 4351 observations and 17 variables: 1 identification, 4 string type, 1 discrete and 11 continuous. There are no missing values in the dataset.

X <- read.csv("./data/mp_batteries.csv", header = TRUE, sep = ",")

X %>%
  head %>%
  kable %>%
  kable_styling("striped", full_width = F) %>% 
  kableExtra::scroll_box(width = "800px")
Battery.ID Battery.Formula Working.Ion Formula.Charge Formula.Discharge Max.Delta.Volume Average.Voltage Gravimetric.Capacity Volumetric.Capacity Gravimetric.Energy Volumetric.Energy Atomic.Fraction.Charge Atomic.Fraction.Discharge Stability.Charge Stability.Discharge Steps Max.Voltage.Step
mp-30_Al Al0-2Cu Al Cu Al2Cu 3.043399 0.0890331 1368.481 5562.790 121.84009 495.27253 0.0 0.6666667 0.0000000 0.0000000 1 0
mp-1022721_Al Al1-3Cu Al AlCu Al3Cu 1.243653 -0.0215863 1112.937 4418.980 -24.02423 -95.38962 0.5 0.7500000 0.0740612 0.0962458 1 0
mp-8637_Al Al0-5Mo Al Mo Al5Mo 4.762574 0.1227568 1741.504 7175.702 213.78156 880.86651 0.0 0.8333333 0.4114601 0.0452120 1 0
mp-129_Al Al0-12Mo Al Mo Al12Mo 12.723893 0.0431214 2298.811 7346.232 99.12801 316.78006 0.0 0.9230769 0.0000000 0.0114456 1 0
mp-91_Al Al0-12W Al W Al12W 12.494598 0.0292342 1900.745 7332.719 55.56677 214.36621 0.0 0.9230769 0.0000000 0.0000000 1 0
mp-1055908_Al Al0-12Mn Al Mn MnAl12 18.236156 0.0397314 2547.693 7592.916 101.22330 301.67688 0.0 0.9230769 0.1454643 0.0000000 1 0
colSums(is.na(X))
##                Battery.ID           Battery.Formula               Working.Ion 
##                         0                         0                         0 
##            Formula.Charge         Formula.Discharge          Max.Delta.Volume 
##                         0                         0                         0 
##           Average.Voltage      Gravimetric.Capacity       Volumetric.Capacity 
##                         0                         0                         0 
##        Gravimetric.Energy         Volumetric.Energy    Atomic.Fraction.Charge 
##                         0                         0                         0 
## Atomic.Fraction.Discharge          Stability.Charge       Stability.Discharge 
##                         0                         0                         0 
##                     Steps          Max.Voltage.Step 
##                         0                         0
str(X)
## 'data.frame':    4351 obs. of  17 variables:
##  $ Battery.ID               : chr  "mp-30_Al" "mp-1022721_Al" "mp-8637_Al" "mp-129_Al" ...
##  $ Battery.Formula          : chr  "Al0-2Cu" "Al1-3Cu" "Al0-5Mo" "Al0-12Mo" ...
##  $ Working.Ion              : chr  "Al" "Al" "Al" "Al" ...
##  $ Formula.Charge           : chr  "Cu" "AlCu" "Mo" "Mo" ...
##  $ Formula.Discharge        : chr  "Al2Cu" "Al3Cu" "Al5Mo" "Al12Mo" ...
##  $ Max.Delta.Volume         : num  3.04 1.24 4.76 12.72 12.49 ...
##  $ Average.Voltage          : num  0.089 -0.0216 0.1228 0.0431 0.0292 ...
##  $ Gravimetric.Capacity     : num  1368 1113 1742 2299 1901 ...
##  $ Volumetric.Capacity      : num  5563 4419 7176 7346 7333 ...
##  $ Gravimetric.Energy       : num  121.8 -24 213.8 99.1 55.6 ...
##  $ Volumetric.Energy        : num  495.3 -95.4 880.9 316.8 214.4 ...
##  $ Atomic.Fraction.Charge   : num  0 0.5 0 0 0 ...
##  $ Atomic.Fraction.Discharge: num  0.667 0.75 0.833 0.923 0.923 ...
##  $ Stability.Charge         : num  0 0.0741 0.4115 0 0 ...
##  $ Stability.Discharge      : num  0 0.0962 0.0452 0.0114 0 ...
##  $ Steps                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Max.Voltage.Step         : num  0 0 0 0 0 0 0 0 0 0 ...
for (name in colnames(X))
{
  if(is.numeric(X[[name]]))
  {
    cat("\n")
    print(name)
    print(summary(X[[name]]))
  }
}
## 
## [1] "Max.Delta.Volume"
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00002   0.01747   0.04203   0.37531   0.08595 293.19322 
## 
## [1] "Average.Voltage"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -7.755   2.226   3.301   3.083   4.019  54.569 
## 
## [1] "Gravimetric.Capacity"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    5.176   88.108  130.691  158.291  187.600 2557.627 
## 
## [1] "Volumetric.Capacity"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.08  311.62  507.03  610.62  722.75 7619.19 
## 
## [1] "Gravimetric.Energy"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -583.5   211.7   401.8   444.1   614.4  5926.9 
## 
## [1] "Volumetric.Energy"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2208.1   821.6  1463.8  1664.0  2252.3 18305.9 
## 
## [1] "Atomic.Fraction.Charge"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00000 0.03986 0.04762 0.90909 
## 
## [1] "Atomic.Fraction.Discharge"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.007407 0.086957 0.142857 0.159077 0.200000 0.993333 
## 
## [1] "Stability.Charge"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.03301 0.07319 0.14257 0.13160 6.48710 
## 
## [1] "Stability.Discharge"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.01952 0.04878 0.12207 0.09299 6.27781 
## 
## [1] "Steps"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.167   1.000   6.000 
## 
## [1] "Max.Voltage.Step"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1503  0.0000 26.9607
rm(list = c("name"))

4 Distribution of attributes

4.1 Density plots

** String variables ** While the ranges of the battery formula, formula charge and formula discharge attributes are wide compared to the number of observations, the range of the working ion attribute is narrow and consists of 11 values. The vast majority of the batteries studied have Li as the working ion (the main ion that transports electric charge). Other types of batteries included in the dataset are calcium, magnesium, sodium and zinc. There are also batteries with Al, Cs, K, Rb, Y as the working ion, but the number of observations with them in the dataset is marginal. Therefore, in the following analysis I combine them into the category Other.

** Continuous variables ** The distribution density plot of each continuous attribute can be described as a plot with a high peak and a long tail. Among them, the most evenly distributed is atomic fracton discharge variable.

** Discrete variables ** steps is the only discrete attribute in the dataset. There are very few observations with more than 1 step between full charge and discharge.

for (name in colnames(X))
{
  if(name == "Battery.ID" ) next
  
  threshold <- 10
  
  if(is.numeric(X[[name]]) && n_distinct(X[[name]]) > threshold)
  {
    plot(density(X[[name]]), main = name)
  }else if(n_distinct(X[[name]]) <= threshold){
    barplot(table(X[[name]]), main = name)
  }else{
    barplot(table(X[[name]]), main = paste(name, ": ", n_distinct(X[[name]]), " distinct values"), xaxt = 'n')
    # print(paste(name, ": ", n_distinct(X[[name]]), " distinct values"))
  }

}

rm(list = c("name", "threshold"))

4.2 Histogram panels

In order to gain more insight into the reasons for such a specific distribution of numerical variables, I present histograms of them, divided into panels according to working ion. The panels are presented for 6 categories: Li, Ca, Mg, Na, Zn and Other combining the rest of the observations. Note that for better visibility I do not show some outliers on the panels - they are filtered out by this line of code: xlim(quantile(mutated_X[,i], probs = c(0.05)), quantile(mutated_X[,i], probs = c(0.95))).

Due to the majority of observations of batteries with lithium as the working ion, the distribution plots are most affected by this. It can be observed that while the distribution for some characteristics is similar for each battery category (max delta volume, stability charge, stability discharge), there are also characteristics that have different distributions for batteries with different working ions. In the next section I analyse which characteristics describe different battery categories.

mutated_X <- X %>%
    mutate(Working.Ion.Other = ifelse(Working.Ion %in% c("Li", "Ca", "Mg", "Na", "Zn"), Working.Ion, "Other" ))

for (i in 6:17) {
  p <- mutated_X %>%
    ggplot(aes(x=mutated_X[,i])) +
      geom_histogram(bins = 20) +
      xlim(quantile(mutated_X[,i], probs = c(0.05)), quantile(mutated_X[,i], probs = c(0.95))) +
      facet_grid(cols = vars(Working.Ion.Other)) +
      xlab(colnames(mutated_X)[i])
  
  print(p)
}
## Warning: Removed 436 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 436 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 435 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 436 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 436 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 436 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 184 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 430 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 216 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 218 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 99 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 218 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

rm(list = c("i", "p"))

5 Characteristics depending on working ion

In this section I want to discover some characteristics of batteries with respect to the working ion. Due to the overrepresentation of lithium batteries in the dataset, I create a sample where there are 200 observations for each battery type. Note that I still use the other category for combination of underrepresented batteries.

5.1 Histograms

Once again I use the histogram to show the distribution of attribute values. Each category is now represented by a different colour instead of being on a separate panel. This helps me to better see the relationship between attribute values for different types of bettery. Below I present my observations regarding the working ion.

** Lithum (Li) ** Lithum batteries are characterised by a high average voltage (between 3 and 5 volts). Most of them have a volumetric capacity value below 500 and a stability charge value below 1.2. The stability discharge of lithium batteries is in most cases also below 1.2.

** Calcium (Ca) ** Calcium batteries reach values from a very wide range for almost all attributes, except stability charge where they reach values below 1.5 in most cases, and atomic fraction discharge where the values are also mostly below 1.5.

** Magnesium (Mg) ** Similar to calcium batteries, the variance for most attributes is high for magnesium batteries. However, there are some characteristics specific to this type of battery. Their average voltage is less than 4 and their maximum delta volume is less than 0.17.

** Sodium (Na) ** Sodium batteries are characterised by low volumetric (below 600) and gravimetric (below 200) capacities. They also have low values for charge stability (less than 2.0) and discharge stability (less than 1.5).

** Zinc (Zn) ** Znic batteries have an average voltage of less than 3.0. Their volumetric energy is mostly below 2500 and their gravimetric energy below 600. They also achieve low values for atomic fraction discharge and stability discharge (below 2.0).

mutated_sample_X <- mutated_X %>% 
  group_by(Working.Ion.Other) %>% 
  slice_sample(n=200)

for (i in c(6:15,17)) {
  p <- mutated_sample_X %>%
    ggplot(aes(x=mutated_sample_X[[colnames(mutated_sample_X)[i]]], fill=Working.Ion.Other, color=Working.Ion.Other)) +
      geom_histogram(bins = 20, alpha = 0.5)  +
      xlim(quantile(mutated_X[[colnames(mutated_X)[i]]], probs = c(0.05)), quantile(mutated_X[[colnames(mutated_X)[i]]], probs = c(0.95))) +
      ylim(0,250) +
      xlab(colnames(mutated_sample_X)[i])
  
  print(p)
}
## Warning: Removed 146 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 139 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 129 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 147 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 143 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 161 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 30 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 119 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 66 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 69 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

## Warning: Removed 37 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).

rm(list = c("i", "p"))

5.2 Attribute importance analysis

My observations from the previous section can be supported by a variable importance plot. As can be seen on the graph, ‘stability discharge’, ‘volumetric energy’, ‘gravimetric energy’, ‘stability charge’ and ‘atomic fraction discharge’ are the 5 most important attributes in distinguishing batteries in terms of their working iron. All 5 attributes are also present in my conclusions from the histograms presented in the previous paragraph.

rrfMod <- train(Working.Ion.Other ~ ., data=mutated_sample_X[,c(10:15,17:18)], method="RRF")
rrfImp <- varImp(rrfMod, scale=F)
plot(rrfImp, top = 5, main='Variable Importance')

rm(list = c("rrfImp", "rrfMod"))

6 Correlations between attributes

The next step is to analyse linear correlations between attributes. I only compute correlations for numerical variables. Note that I do not present results for the variables max delta volume and steps. According to the documentation, max delta volume is calculated from two other variables: stability charge and stability discharge. The steps variable has a very low variance, most observations have a value of steps equal to 1.

6.1 Results

The correlation matrix below shows, that the gravimetric energy and volumetric energy have very strong correlation between each other. Similarly gravimetric capacity and volumetric capacity are very strongly correlated with each other. Slightly smaller but still strong correlations are found between atomic fracton discharge and gravimetric capacity, atomic fracton discharge and volumetric capacity, average voltage and gravimetric energy.

mutated_sample_X %>%
  ungroup%>%
  select(c(Average.Voltage:Stability.Discharge, Max.Voltage.Step)) %>%
  cor() %>%
  round(1) %>%
  ggcorrplot(type = "lower", lab = TRUE)

Below I present 5 more correlation matrices. Each of them shows correlation values between attributes depending on the working ion in the battery.

** Lithium (Li) ** For lithium batteries, stability charge and stability discharge values are very strongly correlated with each other. gravimetric energy is strongly correlated with gravimetric capacity. The same is true for volumetric energy and volumetric capacity.

** Calcium (Ca) ** Similar to lithium batteries, calcium batteries also have a strongly correlated gravimetric capacity with gravimetric energy and volumetric energy with volumetric capacity. Interestingly, calcium batteries have an inverse correlation between average voltage and stability discharge. This is the only strong inverse correlation I have observed.

** Magnesium (Mg) ** Magnesium batteries have perfect correlation between gravimetric energy and volumetric eneregy. They are also very strongly associated volumetric capacity and gravimetric capacity with atomic fraction discharge.

** Sodium (Na) ** As with lithium batteries, stability charge and stability discharge are very strongly correlated for sodium batteries.

** Zinc (Zn) ** Zinc batteries have perfect correlations between gravimetric energy and volumetric eneregy, gravimetric capacity and `volumetric capacity, volumetric capacity and atomic fraction discharge.

for(ion in c("Li", "Ca", "Mg", "Na", "Zn"))
{
  p <- X %>%
    filter(Working.Ion == ion) %>%
    ungroup %>%
    select(c(Average.Voltage:Stability.Discharge, Max.Voltage.Step)) %>%
    cor() %>%
    round(1) %>%
    ggcorrplot(type = "lower", lab = TRUE) +
    ggtitle(ion)
  
  print(p)
}

rm(list = c("ion", "p"))

6.2 Plots

To better understand the correlations between attributes I plot graphs for the variables with the strongest associations.

mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Energy, y=Volumetric.Energy)) +
    geom_point(aes(color=Working.Ion.Other)) +
    geom_smooth(method = "gam") +
    geom_rug(aes(color=Working.Ion.Other))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Capacity, y=Volumetric.Capacity)) +
    geom_point(aes(color=Working.Ion.Other)) +
    geom_smooth(method = "gam") +
    geom_rug(aes(color=Working.Ion.Other))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Capacity, y=Atomic.Fraction.Discharge)) +
    geom_point(aes(color=Working.Ion.Other)) +
    geom_smooth(method = "gam") +
    geom_rug(aes(color=Working.Ion.Other))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

mutated_sample_X %>%
  ggplot(aes(x=Volumetric.Capacity, y=Atomic.Fraction.Discharge)) +
    geom_point(aes(color=Working.Ion.Other)) +
    geom_smooth(method = "gam") +
    geom_rug(aes(color=Working.Ion.Other))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Energy, y=Average.Voltage)) +
    geom_point(aes(color=Working.Ion.Other)) +
    geom_smooth(method = "gam") +
    geom_rug(aes(color=Working.Ion.Other))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

To examine correlations between attributes for different battery types, I use interactive plots that allow the presented observations to be filtered by working ion.

The linear correlation between volumetric energy and gravimetric energy is strong for all battery types. For batteries with magnesium as working ion volumetric capacity grows linearly with gravimetric capacity, until it reaches 4000.

p <- mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Energy, y=Volumetric.Energy)) +
    geom_point(aes(color=Working.Ion.Other))
ggplotly(p)
p <- mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Capacity, y=Volumetric.Capacity)) +
    geom_point(aes(color=Working.Ion.Other))
ggplotly(p)
p <- mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Capacity, y=Atomic.Fraction.Discharge)) +
    geom_point(aes(color=Working.Ion.Other))
ggplotly(p)
p <- mutated_sample_X %>%
  ggplot(aes(x=Volumetric.Capacity, y=Atomic.Fraction.Discharge)) +
    geom_point(aes(color=Working.Ion.Other))
ggplotly(p)
p <- mutated_sample_X %>%
  ggplot(aes(x=Gravimetric.Energy, y=Average.Voltage)) +
    geom_point(aes(color=Working.Ion.Other))
ggplotly(p)
rm(list = c("p"))

On the 3D graph I plot 3 of the 5 most important attributes presented in Attribute importance analysis section. However I do not observe any separate group on the plot.

plot_ly(mutated_sample_X, x=~Atomic.Fraction.Discharge, y=~Stability.Discharge, z=~Volumetric.Energy, type="scatter3d", color=~Working.Ion.Other)
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

7 Tendencies in betteries matterials research

Most of the observations presented in the dataset are for lithium batteries. There is also a focus on calcium, magnesium, sodium and zinc batteries. There are a few observations for other types of batteries, but for now these are marginal examples.

8 Battery type classifier

Due to the large presence of lithium batteries in the dataset, I tryto create a classifier to classify whether a battery is lithium or not.

li_X <- X %>%
    mutate(Li = ifelse(Working.Ion %in% c("Li"), 'Yes', 'No' )) %>%
    select(c(Average.Voltage:Stability.Discharge, Max.Voltage.Step, Li))

I split the data set into training and test data with a ratio of 9:1.

inTraining <-
  createDataPartition(y=li_X$Li, p=0.9, list=FALSE)

X_Train <- li_X[inTraining,]
X_Test <- li_X[-inTraining,]

rm(list = c("inTraining"))

In training, I use the random forest method with repeated cross-validation with set partition equal to 2 and 5 repetitions.

ctrl <- trainControl(
    method = "repeatedcv",
    number = 2,
    repeats = 5)
fit <- train(Li ~ .,
             data = X_Train,
             method = "rf",
             trControl = ctrl,
             ntree = 10)

fit
## Random Forest 
## 
## 3916 samples
##   10 predictor
##    2 classes: 'No', 'Yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (2 fold, repeated 5 times) 
## Summary of sample sizes: 1958, 1958, 1958, 1958, 1958, 1958, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.8589888  0.7131421
##    6    0.8626149  0.7204872
##   10    0.8599591  0.7150745
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 6.

I get an accuracy of 0.8805 on a test data set.

rfClasses <- predict(fit, newdata = X_Test)
confusionMatrix(data = rfClasses, as.factor(X_Test$Li))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  161  26
##        Yes  30 218
##                                           
##                Accuracy : 0.8713          
##                  95% CI : (0.8361, 0.9013)
##     No Information Rate : 0.5609          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7381          
##                                           
##  Mcnemar's Test P-Value : 0.6885          
##                                           
##             Sensitivity : 0.8429          
##             Specificity : 0.8934          
##          Pos Pred Value : 0.8610          
##          Neg Pred Value : 0.8790          
##              Prevalence : 0.4391          
##          Detection Rate : 0.3701          
##    Detection Prevalence : 0.4299          
##       Balanced Accuracy : 0.8682          
##                                           
##        'Positive' Class : No              
##